SDA 3.5 Documentation for REGRESS
NAME
regress - multiple regression
USAGE
regress -b batchfile
DESCRIPTION
REGRESS carries out a conventional regression analysis, using
ordinary least squares, for specified input variables. A weight
variable can be used to give different weights to each case, and
filter variables may be used to exclude some of the cases. If a
case has missing data on ANY of the specified variables, it is
excluded from all the calculations.
Recodes, dummy variables, and product terms can be generated
temporarily within the program itself, so that the user will not
have to create such variables before running a regression.
Ordinarily this program is invoked by the Web interface for the
SDA programs, and the user does not have to deal with the
keywords given in this document. Output from the program is in
HTML, which can be viewed with a Web browser.
It is also possible to run the program directly by preparing a
command file, which specifies the variables to be analyzed and
the options to use. This document explains how to prepare such a
file. The name of this batch command file is specified to the
program after the ‘-b’ option flag.
KEYWORDS
The batch file contains specifications for the analysis. These
specifications are given in the form "keyword = something" with
one keyword per line. Keywords may be given in any order, either
in upper or in lower case. The valid keywords are as follows
(with significant characters shown in capital letters):
Basic Keywords
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
STUdy= path of dataset directory Look for variables in
current directory only
DEP= name of dependent variable REQUIRED
INDep= names of independent vars REQUIRED
(separated by spaces/commas)
Weight= name of weight variable No weighting
Filter= name(s) and codes of filter No filter
variable(s)
STRatum= name of variable giving No stratification for
sample stratum computing standard errors
$1: Force one stratum
CLuster= name of variable giving No cluster variable for
sample cluster computing standard errors
NDECimals= number of decimals for main 3 decimal places
results (coefficients, SE’s)
SAvefile= filename to receive output Output sent to screen
(overwrite existing file) (standard output)
DUMMYgenmax= A number between 1 and 100 Max of 25 dummy vars can be
(max dummy vars) generated by the "m:" syntax
for a single categorical var
GVARCase= LOWER or UPPER No force to lower/upper case
Display Options
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
COLORcoding= Yes No color coding of cells
or colored headings
LAnguagefile= Name of file with non-English English labels on
labels and messages output
RUNtitle= Title or comments for run No title or comments
SHORTlist= Yes Output list of all
independent variables
TExt= Yes No text for variables
Other Statisics
In addition to the main results, one or more of the following
optional statistics can be displayed. You can specify the
desired
number of decimal places in parentheses if the defaults, listed
below, are not satisfactory.
Note that the ‘otherstats=’ keyword can be repeated on
subsequent
lines if necessary.
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
OTHERstats=
TTests (ndec) No T-tests
FTest (ndec) No Global F-test
UNIvariate (ndec) No univariate stats
BPRODuct (ndec) No B*coefficent stats
CORel (ndec) No correlation matrix
COVar (ndec) No covariance matrix
COEFF (ndec) No covar of coefficients matrix
CONF (90, 95, or 99) No confidence intervals
(’CONF’ alone gives 95% CI)
DECIMAL PLACES
Each statistic has a default number of decimal places with which
it will be printed. The default number of decimal places for the
main results (regression coefficients and standard errors and
confidence intervals) is 3 decimal places. The ‘NDECimals=’
keyword is used for changing the number of decimals output for
those statistics.
For the other (optional) statistics, the default numbers of
decimals are as follows:
- T-statistics and P-Values: 3 decimals
- Global F-test: 3 decimals
- Univariate statistics: 2 decimals
- B*Coefficient statistics: 2 decimals
- Correlation matrix: 2 decimals
- Covariance matrix: 2 decimals
- Covariance of coefficients matrix: 6 decimals
To change the number of decimals for these statistics, put
the desired number of decimals in parentheses after specifying
the statistic. Note that requesting the BPRODUCT statistics will
force the output of the univariate statistics as well. And the
specification of decimal places for the BPRODUCT statistics will
override any specification of decimal places for the univariate
statistics.
ABBREVIATIONS
Keywords can usually be abbreviated down to the number of
characters required to differentiate them from other keywords.
The keyword for the name of the dependent variable, for instance,
can be given as ‘dependent=’ or ‘dep=’ or even ‘d=’. Either
upper or lower case may be used. In the list of keywords given
above, the minimum set of characters for each keyword is
capitalized.
COMMENTS
Anything on a line beginning with "#" is ignored by the batch
processor and can therefore be used for comments. Blank lines
are also ignored.
MENTION OF KEYWORD SUFFICIENT
The form ‘keyword=yes’ may be shortened to ‘keyword’. That is,
the ‘=yes’ may be omitted for those options which require no
further specification. For example, ‘text=yes’ can be shortened
to ‘text’.
REPETITION OF KEYWORDS
If there is not enough room on a line to list all of the desired
variables, the keyword can be repeated on a new line, and more
variables can be listed. In such a case the second list is
appended to the first list, for purposes of generating tables.
This appending feature applies to the keywords for specifying the
independent variables, the filter variables, and the
‘otherstats=’ keyword. If other keywords are repeated, the
program will print an error message and stop.
EXAMPLES OF BATCH FILES
# Basic example
study = /sa/testdata
dep = spend
indep = age, educ gender
savefile = myregress
-----------------------------------
# Redefine some ranges, use weight and filter variables,
# and request descriptive text for the variables.
dep = spend
indep = age(18-30) educ gender
weight= wtvar
filters= var21(1-3) var30(1)
text = yes
savefile = myregress
-----------------------------------
# Request some optional statistics, most with specified decimals
dep = spend
indep = age educ gender
otherstats = ttests ftest(4) univar(3) correl(3)
savefile = myregress
CSM, UC Berkeley
April 12, 2011